AITopics | equilibrium policy

Collaborating Authors

equilibrium policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Imitation in Mean-field Games

Neural Information Processing SystemsFeb-15-2026, 12:19:29 GMT

In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Robots (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics

Magnino, Lorenzo, Shao, Kai, Wu, Zida, Shen, Jiacheng, Laurière, Mathieu

arXiv.org Artificial IntelligenceOct-28-2025

Mean field games (MFGs) have emerged as a powerful framework for modeling interactions in large-scale multi-agent systems. Despite recent advancements in reinforcement learning (RL) for MFGs, existing methods are typically limited to finite spaces or stationary models, hindering their applicability to real-world problems. This paper introduces a novel deep reinforcement learning (DRL) algorithm specifically designed for non-stationary continuous MFGs. The proposed approach builds upon a Fictitious Play (FP) methodology, leveraging DRL for best-response computation and supervised learning for average policy representation. Furthermore, it learns a representation of the time-dependent population distribution using a Conditional Normalizing Flow. To validate the effectiveness of our method, we evaluate it on three different examples of increasing complexity. By addressing critical limitations in scalability and density approximation, this work represents a significant advancement in applying DRL techniques to complex MFG problems, bringing the field closer to real-world multi-agent systems.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.22158

Country:

Europe (0.46)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Imitation in Mean-field Games

Neural Information Processing SystemsOct-8-2025, 23:35:00 GMT

In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Robots (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Xu, Kaixuan, Chai, Jiajun, Li, Sicheng, Fu, Yuqian, Zhu, Yuanheng, Zhao, Dongbin

arXiv.org Artificial IntelligenceJun-24-2025

Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.09655

Country:

Atlantic Ocean > North Atlantic Ocean (1.00)
Europe > Italy (0.93)
Asia > Middle East > Republic of Türkiye (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Online Competitive Information Gathering for Partially Observable Trajectory Games

Krusniak, Mel, Xu, Hang, Palermo, Parker, Laine, Forrest

arXiv.org Artificial IntelligenceJun-3-2025

Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space, and through a series of approximations, we present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the joint state space and performs stochastic gradient play. We also provide the necessary adjustments required to deploy this method on individual agents. The method is tested in continuous pursuit-evasion and warehouse-pickup scenarios (alongside extensions to $N > 2$ players and to more complex environments with visual and physical obstacles), demonstrating evidence of active information gathering and outperforming passive competitors.

artificial intelligence, machine learning, particle, (16 more...)

arXiv.org Artificial Intelligence

2506.01927

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Distributed Offloading in Multi-Access Edge Computing Systems: A Mean-Field Perspective

Aggarwal, Shubham, Zaman, Muhammad Aneeq uz, Bastopcu, Melih, Ulukus, Sennur, Başar, Tamer

arXiv.org Artificial IntelligenceJan-30-2025

Multi-access edge computing (MEC) technology is a promising solution to assist power-constrained IoT devices by providing additional computing resources for time-sensitive tasks. In this paper, we consider the problem of optimal task offloading in MEC systems with due consideration of the timeliness and scalability issues under two scenarios of equitable and priority access to the edge server (ES). In the first scenario, we consider a MEC system consisting of $N$ devices assisted by one ES, where the devices can split task execution between a local processor and the ES, with equitable access to the ES. In the second scenario, we consider a MEC system consisting of one primary user, $N$ secondary users and one ES. The primary user has priority access to the ES while the secondary users have equitable access to the ES amongst themselves. In both scenarios, due to the power consumption associated with utilizing the local resource and task offloading, the devices must optimize their actions. Additionally, since the ES is a shared resource, other users' offloading activity serves to increase latency incurred by each user. We thus model both scenarios using a non-cooperative game framework. However, the presence of a large number of users makes it nearly impossible to compute the equilibrium offloading policies for each user, which would require a significant information exchange overhead between users. Thus, to alleviate such scalability issues, we invoke the paradigm of mean-field games to compute approximate Nash equilibrium policies for each user using their local information, and further study the trade-offs between increasing information freshness and reducing power consumption for each user. Using numerical evaluations, we show that our approach can recover the offloading trends displayed under centralized solutions, and provide additional insights into the results obtained.

artificial intelligence, optimization problem, secondary user, (17 more...)

arXiv.org Artificial Intelligence

2501.18718

Country:

North America > United States > Illinois (0.04)
Africa > South Africa > Western Cape > Cape Town (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications > Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Contextual Bandits for Evaluating and Improving Inventory Control Policies

Foster, Dean, Jia, Randy, Madeka, Dhruv

arXiv.org Machine LearningOct-24-2023

Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventory control policy, in particular to see if there is room for improvement. We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward. We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.

bandit, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2310.16096

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

Multi-agent Attention Actor-Critic Algorithm for Load Balancing in Cellular Networks

Kang, Jikun, Wu, Di, Wang, Ju, Hossain, Ekram, Liu, Xue, Dudek, Gregory

arXiv.org Artificial IntelligenceMar-14-2023

T o address this problem, BSs can work collaboratively to deliver a smooth migration (or handoff) and satisfy the UEs' service requirements. This paper formulates the load balancing problem as a Markov game and proposes a Robust Multi-agent Attention Actor-Critic (Robust-MA3C) algorithm that can facilitate collaboration among the BSs (i.e., agents). In particular, to solve the Markov game and find a Nash equilibrium policy, we embrace the idea of adopting a nature agent to model the system uncertainty. Moreover, we utilize the self-attention mechanism, which encourages high-performance BSs to assist low-performance BSs. In addition, we consider two types of schemes, which can facilitate load balancing for both active UEs and idle UEs. We carry out extensive evaluations by simulations, and simulation results illustrate that, compared to the state-of-the-art MARL methods, Robust-MA3C scheme can improve the overall performance by up to 45%.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.08003

Country:

North America > Canada (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Industry:

Energy > Power Industry (0.88)
Telecommunications (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization

Verma, Tanvi, Varakantham, Pradeep

arXiv.org Artificial IntelligenceDec-27-2022

In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e.g., aggregation systems such as Uber, Deliveroo). In this paper, we consider such multi-agent systems where each agent is self-interested and takes a sequence of decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG). We derive key properties for equilibrium solutions in SNCG model with non-atomic and also nearly non-atomic agents. With those key equilibrium properties, we provide a novel Multi-Agent Reinforcement Learning (MARL) mechanism that minimizes variance across values of agents in the same state. To demonstrate the utility of this new mechanism, we provide detailed results on a real-world taxi dataset and also a generic simulator for aggregation systems. We show that our approach reduces the variance in revenues earned by taxi drivers, while still providing higher joint revenues than leading approaches.

agent, local state, variance, (16 more...)

arXiv.org Artificial Intelligence

2212.13379

Country: Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)

Add feedback

Learning in Discounted-cost and Average-cost Mean-field Games

Anahtarcı, Berkay, Karıksız, Can Deha, Saldi, Naci

arXiv.org Artificial IntelligenceNov-10-2022

We consider learning approximate Nash equilibria for discrete-time mean-field games with nonlinear stochastic state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed point is a mean-field equilibrium (i.e. equilibrium in the infinite population limit). We first prove that this operator is a contraction, and propose a learning algorithm to compute an approximate mean-field equilibrium by approximating the MFE operator with a random one. Moreover, using the contraction property of the MFE operator, we establish the error analysis of the proposed learning algorithm. We then show that the learned mean-field equilibrium constitutes an approximate Nash equilibrium for finite-agent games.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1912.13309

Country:

North America > United States > New York (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback